Reality Mining by Nathan Eagle & Kate Greene
Author:Nathan Eagle & Kate Greene [Eagle, Nathan]
Language: eng
Format: epub, pdf
Tags: Data Mining, Computer Science, Sociology, Human-Computer Interaction
ISBN: 9780262027687
Publisher: The MIT Press
Published: 2014-08-10T16:00:00+00:00
IV
The Nation (1 Million to 100 Million People)
7
Taking the Pulse of a Nation: Census, Mobile Phones, and Internet Giants
As Reality Mining scales up, national governments, large companies, and international organizations begin to play a crucial role in the collection, compilation, and dissemination of data. At this national scale, researchers and entrepreneurs can gain access to a wide range of data sources, including national censuses; call records; major Internet companies such as Google, Facebook, and Twitter; and, to a limited extent, banks. Of course, some of these data are more readily available than others.
Census data are by far the easiest to acquire. Many nations make their census findings public via websites from which data can be downloaded and visualized for further analysis. In addition, the World Bank conducts international surveys and compiles census data from all participating nations—a sort of one-stop shop for information on its member countries. These data are publicly accessible: they can be downloaded and independently sorted and analyzed. Importantly, the World Bank offers an open API that allows programmers to integrate various data into software applications. Using World Bank data, Google has integrated a simple visualization tool into its search results; a search query on the population of Botswana will pull up the number, the dated World Bank source, and a graph showing population change over decades.
Another emerging source of data, especially useful for understanding the mobility of people within a country or region, is call data records or call detail records (CDRs). Unlike census and World Bank data, however, CDRs are much more difficult for the average entrepreneur or researcher to access.
As a data set, a CDR contains a log of communication (calls and text messages) and transactional events, including information about the caller or sender and the recipient as well as the time, location, and, in the case of calls, duration. Historically, CDRs have been used exclusively for billing purposes, but starting in 2005 researchers at network service providers and universities began to recognize the value of such data, especially for modeling human mobility. A few researchers and entrepreneurs have agreements to use some mobile carrier data with various limitations. Some mobile carriers are willing to share anonymized CDRs as long as legal agreements detail that no proprietary and personal information will ever be made public. An additional stipulation of such agreements might be that researchers demonstrate to the operators the value of their proposed analysis—for example, the possible development of predictive models of “churn,” referring to subscription termination and product adoption.
The national scale is also where we first address the importance of the major Internet data collectors Google, Facebook, and Twitter. These companies have profound effects on the individuals, communities, and governments who use them, but their power as tools of mass data collection becomes apparent at this scale. The obvious application of data collected by Internet companies is targeted advertising. But the data provide other untapped opportunities, including the chance for targeted market-research surveys and disease tracking (see chapter 10).
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8309)
Azure Data and AI Architect Handbook by Olivier Mertens & Breght Van Baelen(6802)
Building Statistical Models in Python by Huy Hoang Nguyen & Paul N Adams & Stuart J Miller(6777)
Serverless Machine Learning with Amazon Redshift ML by Debu Panda & Phil Bates & Bhanu Pittampally & Sumeet Joshi(6666)
Data Wrangling on AWS by Navnit Shukla | Sankar M | Sam Palani(6450)
Driving Data Quality with Data Contracts by Andrew Jones(6394)
Machine Learning Model Serving Patterns and Best Practices by Md Johirul Islam(6151)
Learning SQL by Alan Beaulieu(6004)
Weapons of Math Destruction by Cathy O'Neil(5795)
Big Data Analysis with Python by Ivan Marin(5394)
Data Engineering with dbt by Roberto Zagni(4400)
Solidity Programming Essentials by Ritesh Modi(4048)
Time Series Analysis with Python Cookbook by Tarek A. Atwan(3907)
Pandas Cookbook by Theodore Petrou(3610)
Blockchain Basics by Daniel Drescher(3306)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2914)
Feature Store for Machine Learning by Jayanth Kumar M J(2820)
Learn T-SQL Querying by Pam Lahoud & Pedro Lopes(2803)
Mastering Python for Finance by Unknown(2748)
